Text Line for Historical Document Images
نویسندگان
چکیده
In this paper we present a new approach for text line segmentation that works directly on gray-scale document images. Our algorithm constructs distance transform directly on the gray-scale images, which is used to compute two types of seams: medial seams and separating seams. A medial seam is a chain of pixels that crosses the text area of a text line and a separating seam is a path that passes between two consecutive rows. The medial seam determines a text line and the separating seams define the upper and lower boundaries of the text line. The medial and separating seams propagate according to energy maps, which are defined based on the constructed distance transform. We have performed various experimental results on different datasets and received encouraging results.
منابع مشابه
Document Image Dewarping Based on Text Line Detection and Surface Modeling (RESEARCH NOTE)
Document images produced by scanner or digital camera, usually suffer from geometric and photometric distortions. Both of them deteriorate the performance of OCR systems. In this paper, we present a novel method to compensate for undesirable geometric distortions aiming to improve OCR results. Our methodology is based on finding text lines by dynamic local connectivity map and then applying a l...
متن کاملرفع اعوجاج هندسی متون بهکمک اطلاعات هندسی خطوط متن
Document images produced by scanners or digital cameras usually have photometric and geometric distortions. If either of these effects distorts document, recognition of words from such a document image using OCR is subject to errors. In this paper we propose a novel approach to significantly remove geometric distortion from document images. In this method first we extract document lines from do...
متن کاملMorphology Based Handwritten Line Segmentation Using Foreground and Background Information
Currently text line segmentation is an important stage of research in historical document processing. Because of inter-line distance variability and base-line skew variability, line segmentation in unconstrained handwritten document is very difficult. The line segmentation task gets complicated, when overlapping or inter-penetration situation occurs between two consecutive text lines. In this p...
متن کاملContent-based text line comparison for historical document retrieval
In the historical handwritten document retrieval system that we are currently building, the training data set elements are the images of handwritten lines with the manually made text transcriptions. We apply sequence comparison algorithms to these text transcriptions. We explore several sequence comparison algorithms that have been applied to phonology for their usefulness in solving a problem ...
متن کاملOptical Process and Analysis of Historical Documents
The collections of historical books are an important source of information, both for the history of previous periods and for the development of the cultural documentation itself. Although to date, there have been made several attempts of digitalization and electronic navigation, there is not an appropriate frame of optical process and analysis of the content of these collections, consequently a...
متن کامل